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Abstract 

In  practical  econometric  analysis  we  are  very  often  faced  with  the 
problem  of  how  to  specify  structural  equations.   The  conventional  t-test 
of  coefficients  is  apparently  inappropriate.   The  largest  root,  say  X, 
of  a  certain  determinantal  equation  provides  us  with  a  basis  for  the 
test  of  over-identifying  restrictions.   The  preliminary  test,  based  on 
X,  may  give  us  a  possible  decision  rule  for  the  choice  of  the  most 
adequate  structural  equation  from  given  nested  alternatives.   However, 
ambiguity  remains  about  how  to  choose  the  significance  level.   As  an 
alternative  procedure,  we  apply  the  minimum  Akaike  Information  Criterion 
to  our  problem.   This  gives  us  a  quite  simple  decision  rule  based  on  the 
comparison  of  X's.  Moreover,  we  propose  another  decision  rule  called 
the  unbiased  decision  rule;  unbiased  in  the  sense  that  we  reach  a 
correct  decision  with  more  than  a  half  probability.   Applications  of 
these  newly  developed  procedures  are  exemplified  by  Klein's  Model  I. 


1.   Introduction 

tix   recent  years,  much  emphasis  has  been  laid  on  the  problem  of 
statistical  model  identification:   how  to  identify  a  model  statistically 
when  it  cannot  be  completely  specified  from  a  priori  ground.   In  fact, 
a  considerable  number  of  works  have  been  done  in  the  last  decade  with 
regard  to  the  choice  of  the  most  adequate  regression  model.   The  purpose 
of  the  present  paper  is  to  extend  the  statistical  procedures  developed 
for  the  choice  of  regression  models  to  a  simultaneous  equations  system. 
When  we  discuss  the  model  identification,  we  must  fix  the  idea  about  the 
adequacy  of  a  model.   That  is,  we  need  to  introduce  a  suitable  measure 
of  the  discrepancy  or  the  distance  of  a  model  from  the  unknown  true 
structure.   Different  measures  lead  us  to  different  procedures,  of 
course . 

It  is  ordinarily  expected  that  the  more  complicated  model  will 
provide  the  better  approximation  to  reality.   However,  on  the  contrary, 
the  less  complicated  model  would  be  preferred  if  we  wish  to  pursue 
accuracy  of  estimation.   In  general,  closeness  to  the  truth  is 
quite  likely  to  be  incompatible  with  parsimony  of  parameters.   That 
is,  if  one  pursues  one  of  the  criteria,  the  other  must  be  necessarily 
sacrificed. 

Akaike  [1]  has  proposed  a  widely  applicable  statistic  that  incor- 
porates these  two  criteria  ingeniously.   As  it  is  based  on  Kullback- 
Leibler's  information  measure  for  discrimination  of  two  probability 
distributions,  Akaike 's  statistic  is  called  the  Akaike  Information 
Criterion  and  is  abbreviated  as  the  AIC.   It  is  defined  as  minus  twice 
the  maximized  likelihood  function  plus  twice  the  number  of  parameters 
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in  a  model.   (See  equation  (3.1).)   Given  a  set  of  alternative  models, 
we  choose  the  one  which  gives  the  smallest  AIC.   The  procedure  is  called 
the  Minimum  AIC  (MAIC).   The  advantage  of  this  procedure  is  its  appli- 
cability to  any  statistical  problem  so  long  as  each  of  the  alternative 
models  well  defines  the  likelihood  function. 

Following  Akaike,  Sawa  [8]  has  recently  developed  another  informa- 
tion criterion  aimed  specifically  for  the  choice  of  linear  regression 
models.   This  criterion  is  also  based  on  Kullback-Leibler's  information 
criterion. 

Mallows  [7]  proposed  a  criterion  called  the  C  statistic  which 
defines  another  procedure  for  selecting  the  optimal  linear  regression 

model.   The  C  statistic  is  defined  to  be  the  residual  sum  of  squares 

P 

(RSS)  plus  twice  the  number  of  parameters  (p)  multiplied  by  an  unbiased 

"2 
estimate  co  of  the  true  variance  of  error  terms: 

(1.1)         C  =  RSS  +  2pa)   . 
P 

Obviously,  the  first  term  measures  the  accuracy  of  a  model,  and  the 
second  term  stands  for  the  penalty  paid  for  increasing  the  number  of 
parameters.   We  note  that  application  of  the  MAIC  to  linear  regression 
yields  an  asymptotically  equivalent  decision  rule  as  Mallows'  C  . 

Sawa  and  Takeuchi  [9]  proposed  another  criterion  for  choosing  an 
optimal  regression  equation.   The  decision  rule  defined  by  this  criterion 
is  called  the  unbiased  decision  rule:   unbiased  in  the  sense  that  it 
leads  us  to  the  choice  of  the  most  adequate  model  with  probability 
greater  than  one-half,  when  we  compare  two  alternatives. 
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In  Section  2,  models  and  notations  are  described.   In  Section  3, 
we  develop  the  MAIC  procedure  for  selecting  the  most  adequate  structural 
equation.   This  gives  us  a  quite  simple  criterion,  so  long  as  we  define 
the  AIC  in  terms  of  concentrated  likelihood  function  for  the  limited 
information  maximum  likelihood  estimate  (Anderson  and  Rubin  [2]).  More- 
over, the  implication  of  the  MAIC  procedure  will  be  discussed  in  the  context 
of  conventional  hypotheses  testing.   In  Section  4,  we  define  a  specifi- 
cation error  of  a  structural  equation  in  terms  of  identification  condi- 
tions.  We  examine  the  distribution  of  the  AIC  criterion  when  both  of 
the  structural  equations,  being  compared,  are  incorrectly  specified. 
In  Section  5,  we  propose  Mallows'  type  risk  function  of  postulating  a 
particular  structural  equation  as  a  model.   Based  on  this  risk  function 
and  the  distribution  theory  developed  in  Section  4,  the  unbiased  decision 
rule  is  derived.   Critical  points  of  unbiased  decision  rule  are  numerically 
evaluated  and  tabulated.   In  Section  6,  a  numerical  example  will  be 
given . 

2.  Models  and  Notations 

Suppose  N  alternative  structural  equations  are  given,  and  we  are 
facing  a  problem:   how  to  identify  the  most  adequate  one  therefrom. 
The  i-th  equation  is  written  as 

(2.1)         y  =  Y^3^  +  Zrr^   +  au^  ,    i  =  1,  ....  N  , 

where  y  and  Y  are  T  x  1  and  T  x  G.  matrices,  respectively,  of  observa- 
tions on  the  endogenous  variables;  Z.  is  a  T  x  K.  matrix  of  observations 
on  the  K.  exogenous  variables;  6.  and  y     are,  respectively,  G  -dimensional 

and  K. -dimensional  column  vectors  of  unkno^^m  parameters;  u.  is  a 
i  >x 
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T-dimensional  column  vector  of  disturbances.   Note  that  every  alternative 
equation  shares  a  common  explained  endogenous  variable.   The  components 
of  u.  are  independently  normally  distributed  with  mean  0  and  unit  variance, 
and  a  is  a  (small)  positive  number.   The  reduced  form  of  the  complete 
system  of  equations  includes 


(2.2) 


y  =  Zir*  +  crv=  Z.tt.  +  Z.-n.   +av  , 
1   ^„     -   -i~i   ~i~i 


(2.3)        Y.  =  zn*  +  aV.  =  z_,n.  +  Z.n^  +  aV.  , 

where  Z  is  a  T  x  K  matrix  of  observations  on  all  the  predetermined  vari- 
ables in  the  system;  Z.  and  Z.  are  T  x  K.  and  T  x  (K-K.)  matrices  of 

^1     ^i         1  1 

observations,  respectively,  on  the  included  and  excluded  predetermined 

variables  in  the  i-th  equation  (2.1);  it*  is  a  K-dimensional  vector  of 

reduced  form  coefficients  subdivided  conformably  with  Z;  11*  is  K  x  G. 

matrix  of  reduced  form  coefficients  subdivided  conformably  with  Z;  v 

is  a  T-dimensional  vector  and  V.  is  a  T  x  G.  matrix  of  disturbances. 

1  i 

Without  losing  any  generality,  we  may  assume 


(2.4)         Z^  Z^  =  0  . 

Each  row  of  (v  V.)  is  independently  normally  distributed  with  mean  0 
and  (nonsingular)  covariance  matrix 


(2.5) 


0)       U)!  ^ 


11 


If  we  post-multiply  (2.3)  by  -3^^  and  add  it  to  (2.2),  we  have 


(2.6)         u^  =  V  -  V^B. 
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In  order  that  (2.1)  be  properly  written  with  Z.  omitted. 


(2.7) 


^1 


n. 
-.1 


_i 


Ti 


If  IT.  =  n.B.  permits  a  unique  solution  for  g.,  then  (2.1)  is  said  to  be 
identifiable. 

We  define  the  minimum  variance  ratio  for  the  i-th  equation  as 


(2.8) 


X.    = 

1 


(y  -  Y^6^)'  P^   (y  -  Y.B^) 
~      ~  i   ~ 

^Z  -  !i?i^ '  ^z  ^!  -  !i!i^ 


—  -1  " 

where  P_,  =  I  -  P,,  =  I  -  F(F'F)   F'  and  B.  is  the  LIML  estimator  of  B.. 
Note  that  X.  never  falls  below  unity. 

3.   Decision  Rule  by  the  Akaike  Information  Criterion  (AIC) 

In  this  section,  we  first  derive  the  AIC  for  a  structural  equation, 
which  provides  us  with  a  decision  rule  to  identify  the  most  adequate 
structural  equation  from  a  given  set  of  alternatives.   Then  we  consider 
about  the  implication  of  the  MAIC  procedure  in  the  context  of  conven- 
tional hypotheses  testing.   For  this  purpose  an  extensive  use  is  made 
of  the  small-a  asymptotic  expansion  originated  by  Kadane  [4,5]. 

The  AIC  is  generally  defined  for  a  particular  model  with  well- 
defined  likelihood  function  as  follows: 


(3.1) 


AIC  =  -2  log  (the  maximized  likelihood)  +  2  (number  of 

parameters) 


The  concentrated  likelihood  function  for  the  i-th  structural  equation 
(2.1)  is 
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T 

(3.2)  constant  -  -r   log  A.  , 

where  X  is  the  minimum  variance  ratio  for  the  i-th  model.   (See  Koopmans 
and  Hood  [3],  pp.  166-8].   Hence  we  have  the  following  propositions. 

Proposition  3.1:   The  AIC  for  the  i-th  structural  equation  is 

(3.3)  AlC(i)  =  T  log  X^  +   2(K^  +  G^)  . 

The  first  term  is  interpreted  to  measure  the  degree  of  goodness-of- 
fit;  it  decreases  along  with  the  augumentation  of  the  model.  More  pre- 
cisely, if  we  augument  the  right-hand-side  variables  in  (2.1),  X. 
approaches  one,  and  it  is  exactly  equal  to  one  whenever  (2.1)  is  just- 
identified.   The  second  term  stands  for  the  penalty  for  losing  degrees- 
of-freedom  by  increasing  the  number  of  unknown  parameters.   Hence  the 
AIC  is  said  to  be  a  statistic  that  takes  into  account  the  trade-off 
between  the  two  desirable  properties  of  statistical  models;  i.e.,  goodness- 
of-fit  and  parsimonious  use  of  parameters. 

The  MAIC  procedure  is  described  as  follows: 

Proposition  3.2  (Decision  Rule):   We  choose  the  j-th  structural  equation 
if  and  only  if 

AIC(j)  £AIC(i)    for  i  =  1,  2,  ...,  N  . 

Now  we  consider  about  the  statistical  implication  of  the  MAIC 
procedure.   Let  us  confine  ourselves  to  the  case  when  N  =  2;  i.e.,  two 
alternative  equations,  say  Ml  and  M2,  are  given.   We  assume  that  Ml  is 
nested  in  M2.   We  note  that  in  conventional  hypotheses  testing  Ml  is 
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taken  as  a  null- hypothesis  and  M2  as  an  alternative  hypothesis.   Accord- 
ing to  Proposition  3.2,  we  choose  Ml  over  M2  if 

(3.4)         T  log  (A^/X^)  <  2  P^2  "^^h  P^^  ^  S  '^  ^2  ~  \  ~  ^1    ' 

and  vice  versa.   The  statistic  T  log  (A  /X  )  is  asymptotically  distributed 
as  X   (^io)  when  Ml  is  true  (Anderson  and  Rubin  [2]).   Then  the  decision 
rule  defined  by  (3.4)  is  asjnnptotically  equivalent  to  the  classical  pre- 
test procedure  with  significance  levels  given  in  Table  3.1. 

Table  3.1:   Significance  Levels  Implied  by  the  MAIC  Procedure 


^12 

1 

2 

3 

4 

5 

% 

17 

16 

15 

8 

7 

The  significance  level  is  fixed  at,  for  example,  5%  or  10%  in  a  con- 
ventional pre-test  regardless  of  the  value  of  ?,«•   However,  the 
MAIC  procedure  adapts  it  to  the  degree  of  freedom. 

More  precise  finite-sample  distribution  of  the  relevant  statistic 
was  given  by  Kadane  [6].  Theorem  2  of  Kadane  [6]  is  worth  citing  as  a 
lemma : 


Lemma  3.1  (Kadane):   As  o  goes  to  zero 

T— K  — G    X 
(3.5)  "4^^^"'^  '^^^'12'  ^"VV  ' 

if  Ml  is  true. 

Combining  (3.4)  and  (3.5)  yields  a  decision  rule  such  that  if 


u 


T  »d-j 
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(3.6)  F^2   ""        p  [exp    (^)    -   1] 

or  approximately 

T-K-G 

(3.7)  F^2   <   2  ^  , 

we  choose  Ml,  where 

T-K-G   X 

(3.8)  F   =   /  ^A   -  1>  • 

If  (3.6)  does  not  hold,  we  choose  M2. 

It  may  be  of  some  interest  to  compute  the  critical  points  of  the 
MAIC  procedure  and  examine  the  implied  significance  levels  on  the  basis 
of  the  approximate  F-distribution.   However,  this  will  lead  us  to 
virtually  the  same  results  that  Sawa  [8]  has  obtained  with  regard  to 
linear  regression.   As  usual,  Kadane's  small-o  asyraptotics  justify  in 
dealing  with  a  structural  equation  as  if  it  were  a  linear  regression  if 
the  disturbance  variance  is  relatively  small. 

4.   Specification  Error  and  Non-Central  F-Distributions 

In  this  section  we  give  a  definition  of  a  specification  error 
occurring  in  a  structural  equation.   In  most  practical  situations  it  is 
quite  likely  that  all  of  the  alternative  equations  are  incorrectly 
specified.   Therefore,  it  would  be  worthwhile  to  derive  the  distribu- 
tion of  the  AIC  statistic  when  every  alternative  is  more  or  less  mis- 
specified. 

Definition  4.1:   The  structural  equation  (2.1)  is  said  to  be 
incorrectly  specified  if 
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(4.1) 


TT. 
,1 


!i 


"i' 

f               > 

1 

— 

11, 

A 

0 

+  an. 


where  n.  is  a  column  vector  with  at  least  one  nonzero  element  among 
the  last  K„  elements. 

We  note  that  (2.7)  is  an  a  priori  restriction  on  the  reduced-form 
coefficients,  which  must  be  taken  into  account  when  we  maximize  the  like- 
lihood function  to  obtain  the  limited  information  maximum  likelihood 
estimate  (Anderson  and  Rubin  [2]).   In  order  to  identify  a  structural 
equation,  we  need  to  impose  these  a_  priori  restrictions,  even  if  we  are 
uncertain  about  the  validity  of  them.   In  any  case,  our  a  priori 
knowledges  about  the  economy  are  described  in  terms  of  restrictions 
such  as  (2.7).   Therefore,  it  would  be  reasonable  to  define  specifica- 
tion errors  of  a  structural  equation  in  such  a  way  as  Definition  4.1. 
The  specification  error  term  n^  is  multiplied  by  a.   This  amounts  to 
assuming  that  the  specification  error  is  in  its  magnitude  of  comparable 
order  with  disturbance  teirms  in  the  equation. 

Using  (4.1)  and  post-multiplying  1  and  -6.  to  (2.2)  and  (2.3), 
we  can  write  the  true  structural  equation  as 


(4.2) 


y  =  Y.e.  +  Z.Y.  +  aZn.  +  au 


To  illuminate  the  implication  of  our  defining  specification  errors  as 
such  let  us  suppose  that  the  true  structural  equation  includes  some  extra 
endogenous  and  exogenous  variables,  say  Y  and  Z  ;  i.e.. 


(4.3) 


1    ^i^i    .1^1     ^s^s     ,sls 


i.t  .•.; 


,(i..i:>   brjsr   (\,:: 


:1     r 


£>M- 
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The  neglected  terms  in  mis-specified  equations  (2.1)  are  assumed  to  be 
of  comparable  order  with  disturbances.   Substituting 


(4.4)  y  =  zn  +  aV  , 

_s   _,s    ^s 
into  (4.3)  yields 

(4.5)  y  =  Y  6  +  Z  Y.  +  aZ(n  B  +  I  Y  )  +  an 

2 
where  terms  of  0(a  )  are  neglected,  and  I  is  a  K  x  s  matrix  such  that 

Z  =  Z  I  .   Comparing  (4.5)  with  (4.2),  we  see  that 

(4.6)  n,  =  n^B^  +  I3Y3  . 
Further  we  see  that 

(4.7)  au  =  av  -  aV.B.  -  o  V  g   , 

where  au  is  the  disturbance  of  the  true  structural  equation  (4.3). 
Combining  this  with  (2.6),  we  have 

(4.8)  au,  =  av.  -  aV.B.  =  ou  +  0(a  )  . 

Hence  (2.1)  and  (4.3)  have  the  same  disturbance  term  up  to  order  0(a) 
in  small-a  asymptotics  sense. 

Lemma  3.1  was  obtained  assuming  that  the  null  model  Ml  is  true. 
However,  if  a  true  structural  model  is  (4.2)  or  equivalently  (4.3)  in 
small-a  sense,  noncentral  parameters  must  be  included  in  the  F-distribution. 

Theorem  4.1:   As  0  goes  to  zero 

^"S~^2  ^1  ,222 

(4.9)  -p^  (l^-  1)  -  F(^2'  T-VG^h^-S^.  6^)  , 


aj6    '  i 


2n±.i 


lOi  aJL 


iilX 


^   ax    ;.,!   br;ti 


->-  ..a.Y  «  Y 

jr. 


fe. 


5     ("'■■g)0    ^O    8f  mIvi, 


.<t.'.A)    ri. 


3   JT   «    .ff 


?£(13     & 


t:>P    frOI 


sv 


■sb'o\'j    ?. 


(^y)O  +    "i  =• 
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where 
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(4.10) 


^k  =  i":'  \^.%' 


and 


(4.11) 


!k 


zn,  * 


,   k  =  1,  2 


This  theorem  will  be  proved  in  the  Appendix.   The  distribution  in 

2  2      2 
(4.9)  is  a  doubly  noncentral  F  with  noncentral  parameters  6.-6.  and  6.. 

A  version  of  the  above  theorm  is  as  follows: 


Lemma  4.2.   As  o  goes  to  zero 


(4.12) 


1=^  (A^  -  X^)  -^  F(Pi2'  T-k|  6^-62,  0)  . 


The  proof  will  also  be  given  in  the  Appendix.   This  distribution 

2  2 
includes  only  one  noncentral  parameter  S^-S^.      On  the  other  hand,  we  lose 

some  degrees  of  freedom  in  denominator  since  K  >_  K„  +  G„. 

Noncentral  F  distributions  derived  in  this  section  will  be  used  in 

the  next  section  to  obtain  unbiased  decision  rule  for  choosing  one  from 

two  alternative  equations. 

5.   Mallows'  Risk  and  Unbiased  Critical  Points  (UCP) 
Following  Mallows  [7],  we  choose 


(5.1) 


Wi=E  |ly°-y,l|2 


as  a  risk  of  postulating  a  structural  equation  (2.1),  where 


(5.2) 


0   „   ^    .        0 
y  =  Ztt*  +  ov 


m:  d- 


••n''t"<v    f  ■':<  .jtI  n - 


•i.-JCj^^     ,  { 
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is  a  vector  of  new  independent  observations  on  y  for  the  same  set  of 
predetermined  variables  Z;  y.  is  a  vector  of  predicted  values  for  y 
based  on  the  limited  information  maximum  likelihood  estimation  of  the 
equation  (2.1):   i.e., 

(5.3)  ;.  =  Z^;.  +  Z^t.    .  P^_  y  +  P^   (y  -  Y.B.)  p. 

1     ~  1   ~ 

where 

(y-Y.6.)'P„y 

(5.4)  p,  =  ~.~^~^     ~^^    .        , 

(y-Y  8  )'P  (y-Y  3  ) 

TT.  and  II  are  the  limited-information  maximum  likelihood  (LIML)  esti- 
X      i 

mators  of  it.  and  ir.  (Anderson  and  Rubin  [2]). 

It  was  proposed  by  Takeuchi  [10]  to  make  use  of  the  LIML  estimators 
of  the  reduced  form  coefficients  to  make  predictions.   The  method  is 
adequately  called  the  single  equation  method  of  prediction  in  analogy 
with  the  single  equation  method  of  estimation. 

We  now  evaluate  W.  asymptotically  as  o  goes  to  zero.   The  proof  of 
this  theorem  will  be  given  in  the  Appendix. 

Theorem  5.1:   As  o  goes  to  zero 

2 

(5.5)  W.  =  o^^{T  +   (1  -  rh"^^  K  +  [r^  -  ~^]  (K^  +  G^) 

2 
2    1-r    2        "^ 

2 
where  6   is  defined  in  (4.10),  oj  is  defined  in  (2.5)  and 

,      E(u'v)^ 

(5.6)  r^  - 


E(u'u)E(v'v) 


-13- 

is  the  square  of  the  correlation  coefficient  between  the  structural  dis- 
turbance u  and  the  reduced-form  disturbance  v  for  y. 

Suppose  that  we  must  choose  one  from  two  alternative  structural 
equations,  say  Ml  and  M2,  the  former  of  which  is  nested  in  the  latter. 
Let  W  and  W„  be  the  risks  of  postulating  the  models  Ml  and  M2,  respec- 
tively.  Our  decision  is  correct  if  we  choose  Ml  when  W  _<  W„  and  M2 
otherwise.   Approximating  W.  (j  =  1,  2)  by  their  small-a  asymptotic 
expansion  given  by  (5.5),  we  can  easily  show  that  the  inequality  W,  <_  W^ 
is  equivalent  to: 

(5.7)  6?  -  6^  <  s  P, 


1    2  -    12 


where 


(5.8)         ,  .  <T-K-l)r^  -  1  ^ 

(T-K-3)r  +  1 

2 
We  note  that  0  £  s  _<  1  and  s  =  1  only  when  r  =1,  which  is  the  case 

when  no  endogenous  variables  are  included  in  a  structural  equation  (4.3). 

For  simplicity  let  us  confine  ourselves  to  a  class  of  decision 

rules  based  on  a  ratio  or  difference  of  A  and  X  .   That  is,  we  decide 

to  choose  Ml  if  ^,/^^  (or  A  -  X  )  is  less  than  some  preassigned  constant 

c  and  choose  M2  otherwise.   Each  decision  rule  is  simply  characterized  by 

a  constant  c,  which  we  call  the  critical  point.   The  MAIC  decision  rule 

is  a  member  of  this  class  with  c  equalling  the  right-hand-side  of  (3.6). 

A  decision  based  on  Kadane's  [6]  preliminary  test  is  also  a  member  of 

this  class,  the  critical  point  of  which  is  determined  depending  on  a 

preassigned  significance  level. 
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In  what  follows  we  will  derive  another  member  of  the  class  which  has 
a  desirable  property  of  unbiasedness.   The  definition  of  unbiasedness  is 
as  follows: 

Definition  5.1:   A  decision  rule  with  a  critical  point  c*  is  said  to  be 
unbiased,  if 


(5.9)         P(F^2  1  ^^l^i  1  "2-*  -  '^ 


and 


(5.10)         P(F^2  >  C*|W^  >  W^)  >  .5  , 

where  F, „  is  a  test  statistic  found  in  either  (4.9)  or  (4.12). 

In  words,  if  a  decision  rule  leads  us  to  the  correct  choice  with 
probability  greater  than  one-half,  then  it  is  said  to  be  unbiased. 

Since  F   is  continuously  distributed,  the  conditions  (5.9)  and 
(5.10)  are  equivalent  to  an  equality: 


(5.11)  P(F^2  1  c*l"i  =  ^2^  '^  -^  • 
From  (5.7)  we  see  that  W^  =  W„  if  and  only  if 

(5.12)  "^l  "  "^2  "  ^  ^12 

We  note  that  the  left-hand-side  of  (5.12)  is  one  of  the  non- 
centrality  parameters  in  the  noncentral  F  distribution  of  F^ „  (see 

Theorem  4.1  and  Lemma  4.2).   The  coefficient  s  depends  on  the  unknown 

2 
correlation  coefficient  r  given  by  (5.6),  which  must  be  estimated  from 

sample  observations. 


( ?.?1 


;oL.e; 
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Now  we  propose  two  decision  rules,  which  are  based  on  small-a 
asymptotic  distributions  in  Theoreiin  4.1  and  Lemma  4.2,  respectively. 

Decision  Rule  I:   We  choose  Ml  if 

T-K^-G-   X, 

(5.13)  p     (^  -  1)  <  c   ; 

otherwise  we  choose  M2,  where  c.  Is  the  median  of  the  noncentral  F 
distribution  F(Pi2>  T-K2-G2IS  P   ,,  0)  where  s  is  the  right-hand-side 
of  (5.6)  with  r^  substituted  by  its  maximum  likelihood  estimate. 

The  noncentrality  parameter  in  the  denominator  is  equated  to  zero. 

This  is  justifiable  when  St^   =  0(o").   This  simplifying  assumption  must 

2 
be  inevitably  made,  because  there  is  no  way  of  estimating  6-,  which 

measures  the  distance  of  the  postulated  model  M2  from  the  true  equation 

2 
(4.3).   It  should  be  noted  that  equating  &„   ^°  ^^^°  implies  that  the 

augumented  model  M2  is  virtually  true  in  sraall-a  sense. 
Decision  Rule  II:   We  choose  Ml  if 

(5.14)  ^   (A.  -  XJ  <  c   , 
*^12   ^    ^ 

where  Cp  is  the  median  of  F  (P,2'  T-KJs  P,2>  0);  we  choose  M2  if  (5.14) 
is  not  satisfied. 

The  small-a  asymptotic  distribution  of  the  statistic  on  the  left- 
hand-side  of  (5.14)  is  a  singly  noncentral  F  as  was  shown  in  Lemma  4.2. 
Therefore,  in  order  to  justify  the  decision  rule  II,  we  need  not  assume 
that  the  augumented  model  is  true.   In  this  sense  the  decision  rule  II 
might  be  preferred  to  the  decision  rule  I  which  is  based  on  a  strong 
assumption  that  the  augumented  model  is  true  in  small-a  sense.   How- 
ever, it  would  be  fair  to  note  that  in  large  econometric  models  K  is 
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far  greater  than  K„  +  G^  and  hence  the  degree  of  freedom  in  the  denom- 
inator is  drastically  reduced  by  switching  from  the  decision  rule  I  to 
the  rule  II. 

6.   Numerical  Example 

The  unbiased  critical  points  (UCP)  are  computed  and  tabulated  in 
Tables  2  and  3  for  various  values  of  P   ,  n  =  T-K^-Gj  (or  T-K) ,  and 
s  =  0.2(0.2)0.8.   We  observe  that  these  UCP's  are  smaller  than  Sawa 
and  Takeuchi's  [8]  UCP's  for  linear  regression  models.   Significance 
ievels  implied  by  the  unbiased  decision  rule  are  also  tabulated  in 
Tables  4  and  5. 

As  an  example  of  application,  we  compare  two  alternative  struc- 
tural wage  functions  in  Klein's  model  I  (T  =  21,  K  =  8): 

>a:   W  =  1.37  +  0.58X,  A,  =  2.47 
(6.1) 

M2:   W  =  1.50  +  0.44X  +  0.13t  +  0.146X_j^,  X^  =   3.25 

where  W  is  the  private  wage  bill,  X  is  the  private  total  production, 

and  t  is  the  time  trend.   The  estimates  of  s  are  0.87  for  Ml  and  0.68 

for  M2.   Klein  chose  M2  as  his  wage  function. 

We  base  our  decision  on  either 

T— K„— G^   A, 
(6.2)         F   =   /  (-A  _  1)  =  2.69 

1^     ^12    ^2 

or 

<^-^>  ^1*2=1^(^1-  V  =^-°«  • 

The  MAIC  critical  point  is  1.784;  the  unbiased  critical  points  when 
s  =  0.7  are  1.303  for  F  ,  and  1.320  for  F*  ;  the  critical  point  of 
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Kadane's  [5]  5%  level  pre- test  is  3.59.   Therefore,  our  decision  rules 
developed  in  this  paper  strongly  support  Klein's  choice  of  the  wage  func- 
tion, while  the  conventional  pre-test  procedure  leads  us  to  the  choice 
of  the  null-model  Ml. 
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Appendix 


Kadane  [4]  is  followed  for  proving  Theorems  4.1  and  4.2.   The  sub- 
script i  in  each  lemma  as  well  as  theorem  is  for  i  =  1,  ...,  N. 


Lemma  A.l;  X,   =0  (1)  as  a  -»  0. 

iii j^ p-i — e. 


(y  -  Y  6  )'P   (y  -  Y  3  ) 


Proof !  1  jl  ^j  =  niln 


6,  (y  -  Y  6  )'P   (y  -  Y  6  ) 


However,  y  -  Y.B.  =  Z.y.  +  oZr\.   +  an   from  (4.2).   Hence 


(A.l)     ll^il 


(u  -K  Zn.)'P„  (u  -t-  Zn.) 
1 

u'P  u 


QED. 


Lemma  A. 2;   For  any  k-class  estimator 


?i 


Y.. 


f      ^ 

^i 


!i 


-1-  a(X'.  X.)  x:  (u.  -I-  Zn.)  ^-  0  (o  ) 
\.i  ,1  ^x  \,x   „^x     p 


if  k  =  0  (1) •   [In  particular,  k  =  1  and  k  =  X] 


The  proof  is  straightforward  from  Lemma  2  of  Kadane  [4] 


Lemma  A. 3 


(u  -H  Zti^)'?^   (u  +  Zn^) 

A,  =  i -1-  0^(a) 

^         u'?,  u  P 


Proof;   P    (y  -  Y  6.)  =  P„   (y  -  Y  8.  -  Z  y,) 

X  X 


[Al 


■I ... 


'!  s.'.  •: 
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=  P   {y  -  Y.3  -  Z.Y.  -  aP   (u.  +  Zn.)}+  0  (a^)   (from  Lemma  A. 2) 

—  Zi,    ^     ^X^X     ^X^X      —X.    -.X     -^.^X       p 

1  1 

=  a  P„  P^  (u  +  Zn.)  +  O(a^)      (from  (A.l)) 
1   i 

=  a  P   (u.  +  Zn.)  +  0  (a  )  . 
.X^   ~i   --1     P 

=  o  P Y   (u  +  Zn.)  +  O^(a^)  , 

1 

since     u.  =  u  +  0  (o)   from  (4.4)  . 
-1   .    p 

Similarly,  P„  (y  -  Y,6,)  =  a  P„  ?   (u  +  Zn,)  +  O(a^) 

-^i^        ~,  ^X^X         ^/«.^A.*    -^  -„__X        P 

=  a  P„  u  +  0  (a^)  . 
-Z  ^    p 

QED. 


Proof  of  Theorem  4.1: 

By  Lemma  A . 3 ,  we  have 

^    (u  +  Zn^ ) 'P„   (u  +  Zn,  ) 

-i= +  o„(^)  • 

2   (u  +  Zn,)'P7  (u  +  ZnJ    P 

—  Z.       -    ')       ~  -~-i- 

However , 

(A.  2)      (u  +  Znj^)'P2   (u  +  Znj^)  ^  X      (T-Kj^-Gj^|6j)     k  =  1,  2  , 

~  k  ~ 


and    (P       -  P     )    Is  orthogonal   to  P      . 
12  2 


Proof  of  Theorem  4.2: 


5ED. 


By  Lemma  A. 3, 


(u  +  Zn,  )'P„     (u  +  ZnJ  -  (u  +  ZnJ'P„     (u  +  Zn„) 

X      -   X      =  i ^^ ? +  0    (a)    , 


•,   S    .1  -  .1 


.c;-i'i 


-24- 


where  P  and  (P   -  P^  )  are  orthogonal.  Also  (A. 2)  holds  for  each  term 
in  the  numerator  of  the  ratio.   On  the  other  hand, 

u'P„  u  ^   X^(T-K)  . 


QED. 


Proof  of  Theorem  5.1 
From  (5.2)  and  (5.3) 

w.  =E||y°-yili' 

■   =  E|  |Z   (tt.  -  ^.)  +  Z.  (7  -  7.)  +  av°|  1^  . 

Expectations  of  cross  products  between  any  two  of  three  terms  in  the 
above  equation  are  zero  since  v  and  v  are  independently  distributed, 
and  Z.  and  Z.  are  orthogonal.   Then 

(A. 3)  \J^   =  o^eIIv^II^  +  e||z^(tt.  -  ^.)||^  +  E||Z.(u^  -  t^)\\^    . 

It  is  easy  to  show 

a^Ellv^ll^  =  o^  T  u)  . 

From  (2.2),  (5.4),  and  the  orthogonality  between  Z.  and  Z  , 

^'l?i(!i  -  5i)|l^=  ^^  eIIp^  v||2  =  aVa,   . 

Hereafter  we  derive  the  expectation  of  the  third  term  in  (A. 3). 

From  (2.2),  (5.3;,  (5.4),  and  the  orthogonality  between  Z,  and  Z , ,  we  have 

(A.4)   e|1z.  (7^  -  w.)||^  =  EllaP^  v  -  P^   (y  -  Y.6.)p J  1^  .    - 

~    I   ~        ~    X     ~        ~~ 

Following  the  proof  of  Lemma  A. 3, 

(y  -  Y  R  )'P  y  =  a^(u  +  Zn,)'P„  v  +  0  (a^) 
^i^i   -Z  _      ^    ^^1   ,Z  ^    p 

=  cT^u'P„  V  +  0  (a^)  ; 
,  >/  ~    p 


ita' 


■012 


^li^..  -  (1;^  -,^) 


0 


..   t{T      ,l^^o; 


.    ui  T  "*  0  =      j  i  "  V 


II    "I 


U)    >     17    !^  ■ 
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(y  -  Yi.)'P„  (y  -  Y.6  )  =  0^  u'P  u  +  0  (a^)  , 

Then 

u'P  V 
(A. 5)   p  .  =     I    +  0  (a)  . 
u'P_  u    P 

Similarly  following  the  proof  of  Lemma  A. 3, 

(A. 6)   P7  (y  -  Y.e.)  =  a  P^  P^   (u  +  Zt\   -)   +  0  (a  )  . 
i  -1-1 

Using  (A. 5)  and  (A.6)»  (A. 4)  is 

u'P     V 
oH\  |P^    V  -  P^     Pjj     (u  +  Zn^)   ~  I^  ~|  1^+0  (a-^)    . 
~i~        ~i~i     ~  u'P     u 

_  u'P      V  _  u'P     V 

(A.7)        =a^E||Py     {v  -  P^     u  ~  _     -}   -   {P-     P^     Zn.    ~  _     -}|r+  0   (a"^)    . 

1     ~        ~^i   ~   u'P„  u  ~^i    ~^i   --"-  u'P„  u 

^    -^^   ^  ^    — "^   — 

Expectation  of  the  cross  product  between  the  first  and  the  second  brackets 

is  zero  since  only  odd  moments  are  included  therein.   In  order  to  take 

expectations  of  squares  of  the  first  and  the  second  brackets,  we  introduce 

a  vector  random  variable  w  which  is  independent  of  u. 


(A. 8)   w  =  V  -  pu  , 
where   p  =  E(u'v)  E(u'u)~   • 

r 

The  expectation  of   the  squre  of  the  first  bracket  in   (A.7)    is 

u' P      V  u'P      V 

(A.9)a2E||P-     v||2+a^EllPy     P^     n  =--^\f -lo'^  Y.\\y-     P       uv'P^ --:=^--| 
-^i   ~  ~^i  -\  ~   u'P^  u  ~^i  ~\  -~,     ^iu'P_  u 

Then  we  have  for  the  first  term  of    (A. 9) 

E||p^     v||    =  (o(K-K.)    , 
-^i   ~ 
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and  for  the  second   term  of  (A.9), 

trace  E  {P;;-     P„     u  u'Py     P7    -^""Z — ^^ 
i        1  11    (u'P     u) 

(u'P^  w)2 
=   trace  E   {P=-     P„     u  u'P„     Y^    -=^— = — =^} 
i        i  11    (u'P„  u) 


+  p^   trace  E   {P77     P^     u  u'P^     P^  }  (from   (A. 8)) 

11  11 

=   (w  -  p   )    trace  E  {P-     P^     u  u'P^     P^  /u'P„   u} 

11  i        1 


2  — 

+  p      trace    (P-     P^     Prr  ) 

ill 


(c  -  p2)    trace  {P^     P^     [_1_  p^  +  _i_  (i  .  p^)]    p^     p-  } 
11  11 

+  p^    (K-K^-G^) 


i  u  P     u 

(A. 10)   li     P^     =  ^.    -  Fz.n,  ,    and  Eww'=    (w  -   p^)    I 
i        1  1  1   i  ^^ 


Finally  for  the  third   term  of   (A.9) 


E||P-     P       u  v'P^     "-^— "I 
i"    i~    ■"~    iu'P„u 


=  e||p^    P^     u  u'P^  w  w'P^  /u'P^uI  I   +  p2  e|  |P-    P^     u  u'P^  I 

=  p^   trace   (P^     P^^     P^  ) 
~    i  -    i  ~    i 
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=  p^  (K-K^-G^)  .  (from  (A. 10)) 


Similarly  the  expectation  of  square  of  the  second  bracket  in  (A. 7)  is 

u'P      V 

(A.ii)E||p^    p^    zn^  ~  ~     ~\r 


11  u  P_,   u 


u'P„   V 


=  ^1  ?'   ^X.   ^.   ^X,   ??i  E(-;z^)' 
~i~i~i  u'Pu 

u'P     w 
=   5^   {p^  +  E    (~  1^   ~)^}  (from    (A. 8)) 

=   S^   {p^  +   (o)  -  p^)   E   (u'P„  u)"^}  (from   (A. 10)) 

-^2,2.(0-0, 

since 

ll  ?'   !x.   !^.   !x.   ?!li  =  ]!  r   ?X.   ?  ']i  =  !!l   ^!z.   -  !z.n.^!!i     (from   (A. 10)) 


Combining  the  above  terms,  we  have 

2 
W^  =  a^  {Tw  +  Ka)  +  i^^f^^T  '   P^^  (^-K.-G.) 


=  a^  {103  +  (o)  -  p^)  l^lfi  K 

2  2 

Since  r  =  p  /u),  Theorem  5.1  is  proved. 


QED. 
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